Topics on statistical design and analysis of cDNA microarray experiment
نویسنده
چکیده
A microarray is a powerful tool for surveying the expression levels of many thousands of genes simultaneously. It belongs to the new genomics technologies which have important applications in the biological, agricultural and pharmaceutical sciences. In this thesis, we focus on the dual channel cDNA microarray which is one of the most popular microarray technologies and discuss three different topics: • Optimal experimental design, • Estimating the true proportion of true nulls, local false discovery rate (lFDR) and positive false discovery rate (pFDR), • Dye effect normalization. The first topic consists of four subtopics each of which is about an independent and practical problem of cDNA microarray experimental design. In the first subtopic, we propose an optimization strategy which is based on the simulated annealing method by Wit et al. (2005) to find optimal or near-optimal designs with both biological and technical replicates. In the second subtopic, we discuss how to apply Q-criterion for the factorial design of microarray experiments. In the third subtopic, we suggest an optimal way of pooling samples, which is actually a replication scheme to minimize the variance of the experiment under the i constraint of fixing the total cost at a certain level. In the fourth subtopic, we indicate that the criterion for distant pair design (Fu and Jansen, 2005) is not proper and propose an alternative criterion instead. The second topic of this thesis is dye effect normalization. For cDNA microarray technology, each array compares two samples which are usually labelled with different dyes Cy3 and Cy5. It assumes that: for a given gene (spot) on the array, if Cy3-labelled sample has k times as much of a transcript as the Cy5-labelled sample, then the Cy3 signal should be k times as high as the Cy5 signal, and vice versa. This important assumption requires that the dyes should have the same properties. However, the reality is that the Cy3 and Cy5 dyes have slightly different properties and the relative efficiency of the dyes vary across the intensity range in a “banana-shape” way. In order to remove the dye effect, we propose a novel dye effect normalization method which is based on modeling dye response functions and dye effect curve. Real and simulated microarray data sets are used to evaluate the method. It shows that the performance of the proposed method is satisfactory. The focus of the third topic is the estimation of the proportion of true null hypotheses, lFDR and pFDR. In a typical microarray experiment, a large number of gene expression data could be measured. In order to find differential expressed genes, these variables are usually screened by a statistical test simultaneously. Since it is a case of multiple hypothesis testing, some kind of adjustment should be made to the p-values resulted from the statistical test. Lots of multiple testing error rates, such as FDR, lFDR and pFDR have been proposed to address this issue. A key related problem is the estimation of the proportion of true null hypotheses (i.e. non-expressed genes). To model the distribution of the p-values, we propose three kinds of finite mixture of unknown number of components (the
منابع مشابه
Review: Practical Design and Analysis of 2-Colour cDNA Microarray Experiments
This review paper, is aimed at biological researchers who are interested in or have begun to use cDNA microarrays for their investigations. Large microarray studies typically involve a multidisciplinary team with various groups performing different aspects of the same experiment. This approach means that microarrays are less accessible to new researchers than more traditional biological techniq...
متن کاملTransformations, background estimation, and process effects in the statistical analysis of microarrays
Microarray technology has made available large data sets that can provide information on gene expression when cells are subjected to various treatments. Before proceeding with a formal statistical analysis, many biological and procedural aspects should be considered. These aspects may guide the analysis and subsequent statistical inference. Several of these issues are discussed in connection wi...
متن کاملComparison of different methodologies to identify differentially expressed genes in two-sample cDNA microarrays
A two-sample microarray design aims at identifying genes expressed differentially in two-sample cDNA arrays. A two-sample experiment is a commonly used design to compare relative mRNA abundance between two different samples. Several statistical techniques are available for such designs. For the identification of differentially expressed genes, four methods were compared: a fold test, a t-test [...
متن کاملMicroarray analysis of gene expression patterns in Arabidopsis seedlings under trehalose, sucrose and sorbitol treatment
Trehalose is the non-reducing alpha-alpha-1, 1-linked glucose disaccharide. The biosynthesisprecursor of trehalose, trehalose-6-phosphate (T6P), is essential for plant development, growth,carbon utilization and alters photosynthetic capacity but its mode of action is not understood. In thecurrent research, 6 days old seedlings of Arabidopsis thaliana (Columbia ecotype) were grown inliquid cultu...
متن کاملDirect versus indirect designs for cDNA microarray experiments
We calculate the variances of two classes of estimates of differential gene expression based on log ratios of fluorescence intensities from cDNA microarray experiments: direct estimates, using measurements from the same slide, and indirect estimates, using measurements from different slides. These variances are compared and numerical estimates are obtained from a small experiment involving 4 sl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009